Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments
نویسندگان
چکیده
Many data sources are naturally modeled by multiple weight assignments over a set of keys: snapshots of an evolving database at multiple points in time, measurements collected over multiple time periods, requests for resources served at multiple locations, and records with multiple numeric attributes. Over such vectorweighted data we are interested in aggregates with respect to one set of weights, such as weighted sums, and aggregates over multiple sets of weights such as the L1 difference. Sample-based summarization is highly effective for data sets that are too large to be stored or manipulated. The summary facilitates approximate processing queries that may be specified after the summary was generated. Current designs, however, are geared for data sets where a single scalar weight is associated with each key. We develop a sampling framework based on coordinated weighted samples that is suited for multiple weight assignments and obtain estimators that are orders of magnitude tighter than previously possible. We demonstrate the power of our methods through an extensive empirical evaluation on diverse data sets ranging from IP network to stock quotes data.
منابع مشابه
Distribution-Aware Sampling and Weighted Model Counting for SAT
Given a CNF formula and a weight for each assignment of values to variables, two natural problems are weighted model counting and distribution-aware sampling of satisfying assignments. Both problems have a wide variety of important applications. Due to the inherent complexity of the exact versions of the problems, interest has focused on solving them approximately. Prior work in this area scale...
متن کاملDoes Type of Pain Predict Pain Severity Changes in Individuals With Multiple Sclerosis? A Longitudinal Analysis Using Generalized Estimating Equations
Background & Objective: Pain is a common symptom among people with MS. In the majority of MS patients, pain is chronic in nature, but it can change over time. The objective of this study was to determine if pain type can predict pain severity changes in individuals with MS over time. Materials & Methods: The research method was a longitudinal design that evaluated pain type and severity at...
متن کاملSampling to estimate arbitrary subset sums
Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. Applied to internet traffic analysis, the items could be records summarizing the flows of packets streaming by a router, with, say, a hundred records to be sampled each hour. A subset could be flow records of a worm attack whose sig...
متن کاملEstimating Sum by Weighted Sampling
We study the classic problem of estimating the sum of n variables. The traditional uniform sampling approach requires a linear number of samples to provide any non-trivial guarantees on the estimated sum. In this paper we consider various sampling methods besides uniform sampling, in particular sampling a variable with probability proportional to its value, referred to as linear weighted sampli...
متن کاملVariance Competitiveness for Monotone Estimation: Tightening the Bounds
Random samples are extensively used to summarize massive data sets and facilitate scalable analytics. Coordinated sampling, where samples of different data sets “share” the randomization, is a powerful method which facilitates more accurate estimation of many aggregates and similarity measures. We recently formulated a model of Monotone Estimation Problems (MEP), which can be applied to coordin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009